|
The hash join is an example of a join algorithm and is used in the implementation of a relational database management system. The task of a join algorithm is to find, for each distinct value of the join attribute, the set of tuples in each relation which have that value. Hash joins require an equijoin predicate (a predicate comparing values from one table with values from the other table using the equals operator '='). == Classic hash join == The classic hash join algorithm for an inner join of two relations proceeds as follows: * First prepare a hash table of the smaller relation. The hash table entries consist of the join attribute and its row. Because the hash table is accessed by applying a hash function to the join attribute, it will be much quicker to find a given join attribute's rows by using this table than by scanning the original relation. * Once the hash table is built, scan the larger relation and find the relevant rows from the smaller relation by looking in the hash table. The first phase is usually called the "build" phase, while the second is called the "probe" phase. Similarly, the join relation on which the hash table is built is called the "build" input, whereas the other input is called the "probe" input. It is like merge join algorithm. This algorithm is simple, but it requires that the smaller join relation fits into memory, which is sometimes not the case. A simple approach to handling this situation proceeds as follows: # For each tuple in the build input ## Add to the in-memory hash table ## If the size of the hash table equals the maximum in-memory size: ### Scan the probe input , and add matching join tuples to the output relation ### Reset the hash table # Do a final scan of the probe input and add the resulting join tuples to the output relation This is essentially the same as the block nested loop join algorithm. This algorithm scans more times than necessary. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「hash join」の詳細全文を読む スポンサード リンク
|